Введение

Хотим провести кластеризацию футбольных игроков

Описание и чистка

Данные, которые мы будем анализировать, представляют из себя подборку 1000 самых популярных современных фильмов (2006-2016) по версии IMDB. Для каждого фильма имеются данные о жанре, продолжительности, количестве голосов, различных вариантах рейтинга, а также информация о прибыли, которая картина принесла при показе в кинотеатрах США. Здесь совсем кратко напомним о данных (можно подробнее посмотреть в прошлом отчёте по классификации).

## [1] 19239   107

Вот так, например, выглядит строчка для Лионеля Месси:

## Linking to ImageMagick 6.9.9.39
## Enabled features: cairo, fontconfig, freetype, lcms, pango, rsvg, webp
## Disabled features: fftw, ghostscript, x11
1 3
sofifa_id 1 158023 20801
player_url 2 https://sofifa.com/player/158023/lionel-messi/220002 https://sofifa.com/player/20801/c-ronaldo-dos-santos-aveiro/220002
short_name 3 L. Messi Cristiano Ronaldo
long_name 4 Lionel Andrés Messi Cuccittini Cristiano Ronaldo dos Santos Aveiro
player_positions 5 RW, ST, CF ST, LW
overall 6 93 91
potential 7 93 91
value_eur 8 78000000 45000000
wage_eur 9 320000 270000
age 10 34 36
dob 11 1987-06-24 1985-02-05
height_cm 12 170 187
weight_kg 13 72 83
club_name 14 Paris Saint-Germain Manchester United
league_name 15 French Ligue 1 English Premier League
league_level 16 1 1
club_position 17 RW ST
club_jersey_number 18 30 7
club_loaned_from 19
club_joined 20 2021-08-10 2021-08-27
club_contract_valid_until 21 2023 2023
nationality 22 Argentina Portugal
nation_position 23 RW ST
nation_jersey_number 24 10 7
preferred_foot 25 Left Right
weak_foot 26 4 4
skill_moves 27 4 5
international_reputation 28 5 5
work_rate 29 Medium/Low High/Low
body_type 30 Unique Unique
real_face 31 Yes Yes
release_clause_eur 32 144300000 83300000
player_tags 33 #Dribbler, #Distance Shooter, #FK Specialist, #Acrobat, #Clinical Finisher, #Complete Forward #Aerial Threat, #Dribbler, #Distance Shooter, #Crosser, #Acrobat, #Clinical Finisher, #Complete Forward
player_traits 34 Finesse Shot, Long Shot Taker (AI), Playmaker (AI), Outside Foot Shot, One Club Player, Chip Shot (AI), Technical Dribbler (AI) Power Free-Kick, Flair, Long Shot Taker (AI), Speed Dribbler (AI), Outside Foot Shot
pace 35 85 87
shooting 36 92 94
passing 37 91 80
dribbling 38 95 87
defending 39 34 34
physic 40 65 75
attacking_crossing 41 85 87
attacking_finishing 42 95 95
attacking_heading_accuracy 43 70 90
attacking_short_passing 44 91 80
attacking_volleys 45 88 86
skill_dribbling 46 96 88
skill_curve 47 93 81
skill_fk_accuracy 48 94 84
skill_long_passing 49 91 77
skill_ball_control 50 96 88
movement_acceleration 51 91 85
movement_sprint_speed 52 80 88
movement_agility 53 91 86
movement_reactions 54 94 94
movement_balance 55 95 74
power_shot_power 56 86 94
power_jumping 57 68 95
power_stamina 58 72 77
power_strength 59 69 77
power_long_shots 60 94 93
mentality_aggression 61 44 63
mentality_interceptions 62 40 29
mentality_positioning 63 93 95
mentality_vision 64 95 76
mentality_penalties 65 75 88
mentality_composure 66 96 95
defending_marking_awareness 67 20 24
defending_standing_tackle 68 35 32
defending_sliding_tackle 69 24 24
goalkeeping_diving 70 6 7
goalkeeping_handling 71 11 11
goalkeeping_kicking 72 15 15
goalkeeping_positioning 73 14 14
goalkeeping_reflexes 74 8 11
goalkeeping_speed 75 NA NA
ls 76 89+3 90+1
st 77 89+3 90+1
rs 78 89+3 90+1
lw 79 92 88
lf 80 93 89
cf 81 93 89
rf 82 93 89
rw 83 92 88
lam 84 93 86+3
cam 85 93 86+3
ram 86 93 86+3
lm 87 91+2 86+3
lcm 88 87+3 78+3
cm 89 87+3 78+3
rcm 90 87+3 78+3
rm 91 91+2 86+3
lwb 92 66+3 63+3
ldm 93 64+3 59+3
cdm 94 64+3 59+3
rdm 95 64+3 59+3
rwb 96 66+3 63+3
lb 97 61+3 60+3
lcb 98 50+3 53+3
cb 99 50+3 53+3
rcb 100 50+3 53+3
rb 101 61+3 60+3
gk 102 19+3 20+3
player_face_url 103 https://cdn.sofifa.com/players/158/023/22_120.png https://cdn.sofifa.com/players/020/801/22_120.png
club_logo_url 104 https://cdn.sofifa.com/teams/73/60.png https://cdn.sofifa.com/teams/11/60.png
club_flag_url 105 https://cdn.sofifa.com/flags/fr.png https://cdn.sofifa.com/flags/gb-eng.png
nation_logo_url 106 https://cdn.sofifa.com/teams/1369/60.png https://cdn.sofifa.com/teams/1354/60.png
nation_flag_url 107 https://cdn.sofifa.com/flags/ar.png https://cdn.sofifa.com/flags/pt.png

Отбор игроков

Россия

Найдём всех и посмотрим на табличку. Эти игроки нам пригодятся для того, чтобы увидеть, куда они попали после кластеризации. Ещё уберём тех игроков, для которых нет информации о зарплате (у них не фиксирован клуб и лига, это понадобится позже).

short_name club_name league_name club_position overall potential value_eur wage_eur ""
221 Mário Fernandes PFC CSKA Moscow Russian Premier League RB 82 82 26500000 57000
391 I. Akinfeev PFC CSKA Moscow Russian Premier League GK 80 80 2300000 26000
617 A. Golovin AS Monaco French Ligue 1 LF 79 83 24500000 53000
759 R. Zobnin Spartak Moskva Russian Premier League RDM 78 80 15000000 51000
766 A. Miranchuk Atalanta Italian Serie A SUB 78 80 17500000 46000
807 A. Lunev Bayer 04 Leverkusen German 1. Bundesliga SUB 78 79 11000000 39000
882 Guilherme FC Lokomotiv Moscow Russian Premier League GK 77 77 1200000 22000
1034 G. Dzhikiya Spartak Moskva Russian Premier League LCB 77 79 11000000 48000
1180 F. Smolov FC Lokomotiv Moscow Russian Premier League RS 76 76 6500000 47000
1181 A. Dzagoev PFC CSKA Moscow Russian Premier League CAM 76 76 6000000 40000
1251 D. Cheryshev Valencia CF Spain Primera Division LM 76 76 7000000 31000
1325 A. Miranchuk FC Lokomotiv Moscow Russian Premier League SUB 76 79 10000000 42000
1561 A. Kokorin Fiorentina Italian Serie A SUB 75 75 5500000 48000
1806 D. Barinov FC Lokomotiv Moscow Russian Premier League RCM 75 82 10500000 33000
1890 A. Sobolev Spartak Moskva Russian Premier League ST 75 80 8500000 45000
2032 G. Schennikov PFC CSKA Moscow Russian Premier League SUB 74 74 3600000 33000
2283 Z. Bakaev Spartak Moskva Russian Premier League SUB 74 78 6000000 41000
2306 R. Zhemaletdinov FC Lokomotiv Moscow Russian Premier League RM 74 78 6000000 33000
2307 A. Maksimenko Spartak Moskva Russian Premier League GK 74 81 7000000 26000
2320 F. Chalov PFC CSKA Moscow Russian Premier League ST 74 80 6500000 34000
2932 I. Oblyakov PFC CSKA Moscow Russian Premier League LB 73 80 6000000 29000
3039 I. Diveev PFC CSKA Moscow Russian Premier League RCB 73 82 6500000 21000
3166 V. Vasin PFC CSKA Moscow Russian Premier League SUB 72 72 1500000 26000
3386 A. Selikhov Spartak Moskva Russian Premier League SUB 72 74 2100000 26000
3509 I. Akhmetov PFC CSKA Moscow Russian Premier League RDM 72 78 3700000 25000
3542 D. Zhivoglyadov FC Lokomotiv Moscow Russian Premier League SUB 72 72 2300000 29000
3544 S. Iljutcenko Jeonbuk Hyundai Motors Korean K League 1 ST 72 72 2300000 10000
3694 K. Kuchaev PFC CSKA Moscow Russian Premier League RM 72 79 4700000 25000
3877 F. Kudryashov Antalyaspor Turkish Süper Lig SUB 71 71 700000 11000
3898 K. Rausch
  1. FC Nürnberg
German 2. Bundesliga SUB 71 71 1400000 8000
4078 I. Kutepov Spartak Moskva Russian Premier League SUB 71 72 1900000 28000
4215 R. Mirzov Spartak Moskva Russian Premier League SUB 71 71 1900000 32000
4521 N. Rasskazov Spartak Moskva Russian Premier League RB 71 76 2600000 24000
4554 S. Magkeev FC Lokomotiv Moscow Russian Premier League RCB 71 80 4000000 22000
4602 A. Rebrov Spartak Moskva Russian Premier League SUB 70 70 180000 13000
4621 K. Nababkin PFC CSKA Moscow Russian Premier League SUB 70 70 525000 20000
4624 A. Eschenko Spartak Moskva Russian Premier League SUB 70 70 350000 18000
4695 E. Prib Fortuna Düsseldorf German 2. Bundesliga LDM 70 70 1300000 14000
4714 A. Zabolotnyi PFC CSKA Moscow Russian Premier League SUB 70 70 1600000 24000
5338 D. Kulikov FC Lokomotiv Moscow Russian Premier League LCM 70 79 3300000 19000
5369 D. Rybchinskiy FC Lokomotiv Moscow Russian Premier League LM 70 78 3600000 21000
5370 N. Umyarov Spartak Moskva Russian Premier League SUB 70 79 3400000 18000
5945 A. Zhirov SV Sandhausen German 2. Bundesliga LCB 69 69 1100000 4000
6282 K. Maradishvili FC Lokomotiv Moscow Russian Premier League RES 69 77 3000000 13000
6283 P. Maslov Spartak Moskva Russian Premier League RES 69 78 3000000 15000
6403 A. Silyanov FC Lokomotiv Moscow Russian Premier League RB 69 78 2900000 13000
6539 E. Bashkirov Zagłębie Lubin Polish T-Mobile Ekstraklasa RDM 68 68 1000000 4000
6812 A. Vasyutin Djurgårdens IF Swedish Allsvenskan SUB 68 73 1400000 3000
6842 N. Haikin FK Bodø/Glimt Norwegian Eliteserien GK 68 73 1400000 2000
7073 I. Zlobin Futebol Clube de Famalicão Portuguese Liga ZON SAGRES SUB 68 76 2300000 3000
7437 M. Mukhin PFC CSKA Moscow Russian Premier League LDM 68 79 2600000 9000
8255 M. Suleymanov GZT Giresunspor Turkish Süper Lig SUB 67 71 1500000 5000
9347 I. Zhigulev Zagłębie Lubin Polish T-Mobile Ekstraklasa SUB 66 71 1200000 3000
9507 N. Tiknizyan FC Lokomotiv Moscow Russian Premier League RES 66 77 1900000 11000
9518 A. Lomovitskiy Spartak Moskva Russian Premier League SUB 66 74 1900000 12000
10292 G. Melkadze Spartak Moskva Russian Premier League SUB 65 70 1100000 11000
10540 I. Shinozuka Kashiwa Reysol Japanese J. League Division 1 RES 65 66 850000 3000
10694 M. Ignatov Spartak Moskva Russian Premier League SUB 65 78 1800000 9000
10746 I. Gaponov Spartak Moskva Russian Premier League RES 65 74 1500000 10000
11038 M. Nenakhov FC Lokomotiv Moscow Russian Premier League RES 65 72 1400000 9000
11306 A. Mitryushkin SG Dynamo Dresden German 2. Bundesliga SUB 64 69 675000 3000
11764 L. Klassen WSG Tirol Austrian Football Bundesliga LB 64 71 1100000 2000
11938 V. Karpov PFC CSKA Moscow Russian Premier League RES 64 79 1300000 2000
13233 E. Shlyakov AFC UTA Arad Romanian Liga I LB 63 63 425000 2000
13240 E. Sevikyan Levante Unión Deportiva Spain Primera Division RES 63 77 1100000 3000
14148 N. Iosifov Villarreal CF Spain Primera Division RES 62 75 950000 4000
14366 S. Babkin FC Lokomotiv Moscow Russian Premier League SUB 62 77 925000 3000
14393 V. Yakovlev PFC CSKA Moscow Russian Premier League RES 62 75 950000 5000
15876 A. Savin FC Lokomotiv Moscow Russian Premier League SUB 60 70 475000 3000
16442 Y. Mikhailov FC Schalke 04 German 2. Bundesliga SUB 59 76 575000 750
16794 V. Molchan Stade Malherbe Caen French Ligue 2 RES 58 67 425000 750
16890 V. Cherny DSC Arminia Bielefeld German 1. Bundesliga SUB 58 76 525000 1000
17476 A. Chernov Vejle Boldklub Danish Superliga SUB 56 66 275000 1000
17597 D. Markitesov Spartak Moskva Russian Premier League RES 56 73 375000 6000
17724 A. Thomas Seattle Sounders FC USA Major League Soccer RES 56 63 250000 850
17966 D. Bokov PFC CSKA Moscow Russian Premier League SUB 55 74 275000 500
18109 D. Khudyakov FC Lokomotiv Moscow Russian Premier League SUB 55 75 300000 500
18434 T. Akmurzin Spartak Moskva Russian Premier League RES 53 63 180000 4000
18487 V. Torop PFC CSKA Moscow Russian Premier League SUB 53 75 275000 500
18683 A. Poplevchenkov Spartak Moskva Russian Premier League RES 52 66 170000 3000
18853 I. Repyakh Vejle Boldklub Danish Superliga RES 52 66 190000 1000

Русских столько: 81.

Мир

Возьмём 50 самых лучших по оценке overall в FIFA футболистов (первые 50 строк)

1:50 short_name club_name league_name club_position overall potential value_eur wage_eur
1 L. Messi Paris Saint-Germain French Ligue 1 RW 93 93 78000000 320000
2 R. Lewandowski FC Bayern München German 1. Bundesliga ST 92 92 119500000 270000
3 Cristiano Ronaldo Manchester United English Premier League ST 91 91 45000000 270000
4 Neymar Jr Paris Saint-Germain French Ligue 1 LW 91 91 129000000 270000
5 K. De Bruyne Manchester City English Premier League RCM 91 91 125500000 350000
6 J. Oblak Atlético de Madrid Spain Primera Division GK 91 93 112000000 130000
7 K. Mbappé Paris Saint-Germain French Ligue 1 ST 91 95 194000000 230000
8 M. Neuer FC Bayern München German 1. Bundesliga GK 90 90 13500000 86000
9 M. ter Stegen FC Barcelona Spain Primera Division GK 90 92 99000000 250000
10 H. Kane Tottenham Hotspur English Premier League ST 90 90 129500000 240000
11 N. Kanté Chelsea English Premier League RCM 90 90 100000000 230000
12 K. Benzema Real Madrid CF Spain Primera Division CF 89 89 66000000 350000
13 T. Courtois Real Madrid CF Spain Primera Division GK 89 91 85500000 250000
14 H. Son Tottenham Hotspur English Premier League LW 89 89 104000000 220000
15 Casemiro Real Madrid CF Spain Primera Division CDM 89 89 88000000 310000
16 V. van Dijk Liverpool English Premier League LCB 89 89 86000000 230000
17 S. Mané Liverpool English Premier League LW 89 89 101000000 270000
18 M. Salah Liverpool English Premier League RW 89 89 101000000 270000
19 Ederson Manchester City English Premier League GK 89 91 94000000 200000
20 J. Kimmich FC Bayern München German 1. Bundesliga RDM 89 90 108000000 160000
21 Alisson Liverpool English Premier League GK 89 90 82000000 190000
22 G. Donnarumma Paris Saint-Germain French Ligue 1 GK 89 93 119500000 110000
23 Sergio Ramos Paris Saint-Germain French Ligue 1 LCB 88 88 24000000 115000
24 L. Suárez Atlético de Madrid Spain Primera Division RS 88 88 44500000 135000
25 T. Kroos Real Madrid CF Spain Primera Division LCM 88 88 75000000 310000
26 R. Lukaku Chelsea English Premier League ST 88 88 93500000 260000
27 K. Navas Paris Saint-Germain French Ligue 1 SUB 88 88 15500000 130000
28 R. Sterling Manchester City English Premier League SUB 88 89 107500000 290000
29 Bruno Fernandes Manchester United English Premier League CAM 88 89 107500000 250000
30 E. Haaland Borussia Dortmund German 1. Bundesliga RS 88 93 137500000 110000
31 S. Agüero FC Barcelona Spain Primera Division ST 87 87 51000000 260000
32 H. Lloris Tottenham Hotspur English Premier League GK 87 87 13500000 125000
33 L. Modrić Real Madrid CF Spain Primera Division RCM 87 87 32000000 190000
34 A. Di María Paris Saint-Germain French Ligue 1 SUB 87 87 49500000 160000
35 W. Szczęsny Juventus Italian Serie A GK 87 87 42000000 105000
36 T. Müller FC Bayern München German 1. Bundesliga CAM 87 87 66000000 140000
37 C. Immobile Lazio Italian Serie A ST 87 87 67500000 125000
38 P. Pogba Manchester United English Premier League RDM 87 87 79500000 220000
39 M. Verratti Paris Saint-Germain French Ligue 1 LCM 87 87 79500000 155000
40 Marquinhos Paris Saint-Germain French Ligue 1 RCB 87 90 90500000 135000
41 L. Goretzka FC Bayern München German 1. Bundesliga LDM 87 88 93000000 140000
42 P. Dybala Juventus Italian Serie A CAM 87 88 93000000 160000
43 A. Robertson Liverpool English Premier League LB 87 88 83500000 175000
44 F. de Jong FC Barcelona Spain Primera Division RCM 87 92 119500000 210000
45 T. Alexander-Arnold Liverpool English Premier League RB 87 92 114000000 150000
46 J. Sancho Manchester United English Premier League LM 87 91 116500000 150000
47 Rúben Dias Manchester City English Premier League RCB 87 91 102500000 170000
48 G. Chiellini Juventus Italian Serie A SUB 86 86 12000000 88000
49 S. Handanovič Inter Italian Serie A GK 86 86 7500000 78000
50 M. Hummels Borussia Dortmund German 1. Bundesliga LCB 86 86 44000000 95000

Кластеризация: первичные соображения

Позиций в футболе достаточно много, особенно если рассматривать в классификации, которая дана здесь

##  [1] RW  ST  LW  RCM GK  CF  CDM LCB RDM RS  LCM SUB CAM RCB LDM LB  RB  LM  RM 
## [20] LS  CB  RES     RWB RF  CM  LWB LAM LF  RAM
## 30 Levels:  CAM CB CDM CF CM GK LAM LB LCB LCM LDM LF LM LS LW LWB RAM ... SUB

R и L — right и left, F и B — forward и back, C — center, S - striker

Если не учитывать голкиперов, то обычно мы говорим о защите, полузащите и нападении. В данном датасете присутствуют характеристики, которые потенциально могут помочь в определении предположительной позиции.

Давайте посмотрим, о каких характеристиках идёт речь:

##  [1] "pace"                        "shooting"                   
##  [3] "passing"                     "dribbling"                  
##  [5] "defending"                   "physic"                     
##  [7] "attacking_crossing"          "attacking_finishing"        
##  [9] "attacking_heading_accuracy"  "attacking_short_passing"    
## [11] "attacking_volleys"           "skill_dribbling"            
## [13] "skill_curve"                 "skill_fk_accuracy"          
## [15] "skill_long_passing"          "skill_ball_control"         
## [17] "movement_acceleration"       "movement_sprint_speed"      
## [19] "movement_agility"            "movement_reactions"         
## [21] "movement_balance"            "power_shot_power"           
## [23] "power_jumping"               "power_stamina"              
## [25] "power_strength"              "power_long_shots"           
## [27] "mentality_aggression"        "mentality_interceptions"    
## [29] "mentality_positioning"       "mentality_vision"           
## [31] "mentality_penalties"         "mentality_composure"        
## [33] "defending_marking_awareness" "defending_standing_tackle"  
## [35] "defending_sliding_tackle"    "goalkeeping_diving"         
## [37] "goalkeeping_handling"        "goalkeeping_kicking"        
## [39] "goalkeeping_positioning"     "goalkeeping_reflexes"

Кажется, что характеристики должны хорошо различать атакующих игроков от игроков защиты и полузащиты. Полузащиту в данном случае можно воспринимать как универсальных игроков. Здесь нет намёка на правый/левый фланг и правша/левша, поэтому надеемся, что этот фактор не будет различать кластеры.

Для того, чтобы кластеризация не пошла по возрасту/потенциалу/общему уровню игры, эти признаки мы тоже не включаем.

Так как некоторые характеристик для голкиперов отсутствуют, да и явно есть отличие между вратарями и полевыми игроками, мы изымем их из рассмотрения. Характеристики, которые начинаются с “goalkeeping” мы оставим, они могут помочь различать защитников.

## [1] FALSE FALSE FALSE FALSE FALSE  TRUE
## [1] 2132

Не так их и много.

Ещё одна проблема заключается в том, что выборка большая и может включать в себя неоднородности, которые хотелось бы избежать. Например, в низших лигах границы между игроками могут быть размыты сильнее. Посмотрим, сколько игроков останется, если оставим только игроков команд высших лиг.

## [1] TRUE TRUE TRUE TRUE TRUE TRUE
## [1] 14857

Также, по этим признакам у нас не должно быть NA, уберём их позже, их немного.

Таким образом, остаётся столько футболистов:

## [1] 13193

Первичный анализ

Признаков много, поэтому проведём минимальный анализ. Сделаем два датафрейма, один с интересующими нас признаками, другой — с общей информацией об игроке, чтобы потом удобно было анализировать результат. NA уберём, как обещали.

## [1] 13193    40
## [1] 13193    10
##       pace          shooting       passing        dribbling       defending    
##  Min.   :28.00   Min.   :18.0   Min.   :25.00   Min.   :26.00   Min.   :15.00  
##  1st Qu.:62.00   1st Qu.:42.0   1st Qu.:51.00   1st Qu.:57.00   1st Qu.:38.00  
##  Median :69.00   Median :55.0   Median :58.00   Median :64.00   Median :56.00  
##  Mean   :68.33   Mean   :52.8   Mean   :57.88   Mean   :63.03   Mean   :52.03  
##  3rd Qu.:76.00   3rd Qu.:64.0   3rd Qu.:65.00   3rd Qu.:70.00   3rd Qu.:65.00  
##  Max.   :97.00   Max.   :94.0   Max.   :93.00   Max.   :95.00   Max.   :91.00  
##      physic      attacking_crossing attacking_finishing
##  Min.   :29.00   Min.   :15.00      Min.   :10.00      
##  1st Qu.:59.00   1st Qu.:45.00      1st Qu.:37.00      
##  Median :66.00   Median :56.00      Median :53.00      
##  Mean   :64.89   Mean   :54.56      Mean   :50.65      
##  3rd Qu.:72.00   3rd Qu.:65.00      3rd Qu.:64.00      
##  Max.   :90.00   Max.   :94.00      Max.   :95.00      
##  attacking_heading_accuracy attacking_short_passing attacking_volleys
##  Min.   :17.00              Min.   :23.00           Min.   :10.00    
##  1st Qu.:48.00              1st Qu.:58.00           1st Qu.:35.00    
##  Median :57.00              Median :64.00           Median :47.00    
##  Mean   :56.75              Mean   :63.41           Mean   :46.89    
##  3rd Qu.:65.00              3rd Qu.:70.00           3rd Qu.:58.00    
##  Max.   :93.00              Max.   :94.00           Max.   :90.00    
##  skill_dribbling  skill_curve    skill_fk_accuracy skill_long_passing
##  Min.   :18.0    Min.   :12.00   Min.   :10.00     Min.   :20.00     
##  1st Qu.:55.0    1st Qu.:40.00   1st Qu.:34.00     1st Qu.:50.00     
##  Median :63.0    Median :52.00   Median :44.00     Median :59.00     
##  Mean   :61.4    Mean   :51.93   Mean   :46.27     Mean   :57.09     
##  3rd Qu.:70.0    3rd Qu.:64.00   3rd Qu.:58.00     3rd Qu.:66.00     
##  Max.   :96.0    Max.   :94.00   Max.   :94.00     Max.   :93.00     
##  skill_ball_control movement_acceleration movement_sprint_speed
##  Min.   :24.00      Min.   :27.00         Min.   :27.00        
##  1st Qu.:58.00      1st Qu.:62.00         1st Qu.:63.00        
##  Median :65.00      Median :69.00         Median :69.00        
##  Mean   :63.91      Mean   :68.29         Mean   :68.34        
##  3rd Qu.:70.00      3rd Qu.:76.00         3rd Qu.:76.00        
##  Max.   :96.00      Max.   :97.00         Max.   :97.00        
##  movement_agility movement_reactions movement_balance power_shot_power
##  Min.   :27.00    Min.   :29.00      Min.   :26.00    Min.   :20.00   
##  1st Qu.:59.00    1st Qu.:56.00      1st Qu.:60.00    1st Qu.:51.00   
##  Median :68.00    Median :62.00      Median :68.00    Median :61.00   
##  Mean   :66.68    Mean   :62.36      Mean   :66.94    Mean   :59.65   
##  3rd Qu.:75.00    3rd Qu.:68.00      3rd Qu.:75.00    3rd Qu.:70.00   
##  Max.   :96.00    Max.   :94.00      Max.   :96.00    Max.   :95.00   
##  power_jumping   power_stamina  power_strength  power_long_shots
##  Min.   :29.00   Min.   :24.0   Min.   :19.00   Min.   :11.00   
##  1st Qu.:58.00   1st Qu.:61.0   1st Qu.:58.00   1st Qu.:40.00   
##  Median :66.00   Median :68.0   Median :67.00   Median :54.00   
##  Mean   :65.77   Mean   :67.4   Mean   :65.59   Mean   :51.59   
##  3rd Qu.:74.00   3rd Qu.:75.0   3rd Qu.:74.00   3rd Qu.:64.00   
##  Max.   :95.00   Max.   :97.0   Max.   :96.00   Max.   :94.00   
##  mentality_aggression mentality_interceptions mentality_positioning
##  Min.   :20.00        Min.   :10.00           Min.   :12.00        
##  1st Qu.:50.00        1st Qu.:35.00           1st Qu.:48.00        
##  Median :61.00        Median :56.00           Median :58.00        
##  Mean   :59.65        Mean   :50.92           Mean   :55.88        
##  3rd Qu.:70.00        3rd Qu.:65.00           3rd Qu.:66.00        
##  Max.   :95.00        Max.   :91.00           Max.   :96.00        
##  mentality_vision mentality_penalties mentality_composure
##  Min.   :13.00    Min.   :13.00       Min.   :30.00      
##  1st Qu.:48.00    1st Qu.:42.00       1st Qu.:53.00      
##  Median :58.00    Median :51.00       Median :61.00      
##  Mean   :56.35    Mean   :51.85       Mean   :60.57      
##  3rd Qu.:66.00    3rd Qu.:61.00       3rd Qu.:68.00      
##  Max.   :95.00    Max.   :93.00       Max.   :96.00      
##  defending_marking_awareness defending_standing_tackle defending_sliding_tackle
##  Min.   :10.00               Min.   :10.00             Min.   :10.00           
##  1st Qu.:37.00               1st Qu.:37.00             1st Qu.:34.00           
##  Median :55.00               Median :59.00             Median :56.00           
##  Mean   :51.04               Mean   :52.63             Mean   :50.22           
##  3rd Qu.:65.00               3rd Qu.:67.00             3rd Qu.:65.00           
##  Max.   :93.00               Max.   :93.00             Max.   :92.00           
##  goalkeeping_diving goalkeeping_handling goalkeeping_kicking
##  Min.   : 2.00      Min.   : 2.00        Min.   : 2.00      
##  1st Qu.: 8.00      1st Qu.: 8.00        1st Qu.: 8.00      
##  Median :10.00      Median :10.00        Median :10.00      
##  Mean   :10.34      Mean   :10.37        Mean   :10.38      
##  3rd Qu.:13.00      3rd Qu.:13.00        3rd Qu.:13.00      
##  Max.   :29.00      Max.   :33.00        Max.   :31.00      
##  goalkeeping_positioning goalkeeping_reflexes
##  Min.   : 2.00           Min.   : 2.00       
##  1st Qu.: 8.00           1st Qu.: 8.00       
##  Median :10.00           Median :10.00       
##  Mean   :10.37           Mean   :10.33       
##  3rd Qu.:13.00           3rd Qu.:13.00       
##  Max.   :33.00           Max.   :37.00
## Loading required package: viridisLite
## No id variables; using all as measure variables

Как можно видеть, многие из приведённых графиков бимодальны, например, defending и attacking, что может быть хорошим знаком того, что кластеризация у нас получится (и может даже в нормальной модели)

Факторный анализ

Попробуем сократить размерность пространства признаков и посмотрим на biplot.

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
pace 0.4803675 -0.3412889 0.5521689 0.5362830 -0.0038420
shooting 0.9031861 -0.1770167 -0.3129565 0.0110116 -0.0063931
passing 0.8833757 0.3275755 0.1276242 -0.2264294 -0.0046164
dribbling 0.9432781 0.0044614 0.1485479 -0.0024266 -0.0286186
defending -0.1751087 0.9237815 0.2755775 -0.0831520 -0.0009376
physic 0.0837918 0.7716509 -0.3182989 0.4479501 -0.0157178
attacking_crossing 0.7481943 0.1448143 0.3218917 -0.1297755 0.0435817
attacking_finishing 0.8358671 -0.2995539 -0.3078979 0.0439884 -0.0070833
attacking_heading_accuracy 0.0475299 0.5399340 -0.5331219 0.3765858 -0.0575140
attacking_short_passing 0.7377009 0.4873940 0.0389338 -0.1629982 -0.0458902
attacking_volleys 0.8232193 -0.1696750 -0.3152016 0.0049402 -0.0004373
skill_dribbling 0.9111194 -0.0528693 0.1094648 -0.0209284 -0.0343958
skill_curve 0.8575685 0.0360943 0.0376135 -0.1682756 0.0177360
skill_fk_accuracy 0.7556856 0.0770102 -0.0303190 -0.3009694 0.0483341
skill_long_passing 0.6035401 0.5489778 0.1500831 -0.3025632 -0.0211363
skill_ball_control 0.8846212 0.2003612 -0.0024331 -0.0373661 -0.0491033
movement_acceleration 0.5004925 -0.3598370 0.5712761 0.4488248 0.0042207
movement_sprint_speed 0.4347063 -0.3050335 0.5026461 0.5759626 -0.0102871
movement_agility 0.6814319 -0.2551427 0.4719116 0.1567143 0.0412622
movement_reactions 0.6286497 0.5471269 -0.1215489 0.1392700 -0.0321509
movement_balance 0.4869166 -0.2736985 0.5580139 -0.0628797 0.0471450
power_shot_power 0.8115880 0.0725762 -0.3239688 0.0168331 -0.0231424
power_jumping -0.0051228 0.3817842 -0.1027907 0.5724873 0.0264658
power_stamina 0.3800651 0.4967675 0.1686971 0.3596492 0.0260374
power_strength -0.0745845 0.5887408 -0.5305355 0.4068123 -0.0367999
power_long_shots 0.8776214 -0.0496767 -0.2322783 -0.0732929 0.0073395
mentality_aggression 0.0855554 0.8016419 -0.0621234 0.1722462 -0.0042107
mentality_interceptions -0.1404271 0.8940623 0.2938862 -0.1041136 0.0095903
mentality_positioning 0.8732683 -0.1566944 -0.1149013 0.0448532 0.0034783
mentality_vision 0.8774478 0.1018037 0.0072120 -0.2056271 -0.0093744
mentality_penalties 0.7206516 -0.1555862 -0.3966560 -0.0286951 0.0007707
mentality_composure 0.7114847 0.4449490 -0.1567992 0.0252829 -0.0340337
defending_marking_awareness -0.1584522 0.8846805 0.2890390 -0.0946016 0.0025191
defending_standing_tackle -0.2008635 0.8744922 0.3326012 -0.1256241 0.0000211
defending_sliding_tackle -0.2363179 0.8515204 0.3566346 -0.1167522 0.0015928
goalkeeping_diving 0.0212048 0.0504192 -0.0397377 0.0065109 0.4959255
goalkeeping_handling 0.0299829 0.0505023 -0.0453224 0.0122532 0.4947078
goalkeeping_kicking 0.0537447 0.0480149 -0.0456355 0.0369209 0.4912717
goalkeeping_positioning 0.0364707 0.0570117 -0.0684556 0.0035093 0.4429337
goalkeeping_reflexes 0.0412314 0.0430386 -0.0506463 0.0101047 0.5012007

Первый фактор, по всей видимости, характеризует атакующую игру, а второй — защиту.

Также построим biplot и найдём некоторых игроков, чтобы интерпретировать полученный результат.

bestest
1 L. Messi RW
4 Neymar Jr LW
3 Cristiano Ronaldo ST
5 K. De Bruyne RCM
29 M. Verratti LCM
15 J. Kimmich RDM
23 S. Agüero ST
16 Sergio Ramos LCB
61 E. Cavani SUB
47 R. Mahrez RW
56 Rodri CDM
115 L. Sané LM

Нельзя сказать, что получилось однозначно (из-за полузащиты). С Месси (1), Неймаром (4) и Де Брюйне (5, атакующий полузащитник) всё логично, они в атаке.
Махрез (47) сейчас полузащитник-вингер, однако помимо подключений к атакам, от этих полузащитников требуется защита их игровых зон от проходов крайних защитников и опорных соперника. Киммих (15) тоже полузащитник. А вот Агуэро (23), вообще говоря, нападающий. Родри (56) - опорный полузащитник, поэтому немного странно, что он там, где есть. Впрочем, общая логика всё же присутствует.

Кстати, нумерация идёт по общему рейтингу, хорошо видим, что слева внизу индексы большие.

Отчётливо видим, что есть облака точек; можно увидеть два крупных или три поменьше.

Кластеризация

SOM

Это хорошо, что все игроки распределились по ячейкам равномерно. Может быть нужно увеличить карту, чтобы получилось поменьше индивидов на ячейку, но пока оставим так.

% TODO:: записать некоторые объяснения того, что мы видим

## Warning in abbreviate(players_info$short_name[1:100], 10): abbreviate used with
## non-ASCII chars
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0160
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+011f
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+010d
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0160
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+011f
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+010d
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'L. Modrić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0160
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <c5>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'M.Škriniar' in 'mbcsToSbcs': dot substituted for <a0>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'I.Gündoğan' in 'mbcsToSbcs': dot substituted for <9f>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+011f
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'SMlnkvć-Sv' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'ZIbrahimvć' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'D. Tadić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <8d>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'J. Iličić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+010d
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), : font
## metrics unknown for Unicode character U+0107
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <c4>
## Warning in text.default(x$grid$pts[classif, 1] + rnorm(length(classif), :
## conversion failure on 'S. Savić' in 'mbcsToSbcs': dot substituted for <87>

Model-based пример: mclust

Так как метод встречался ранее, пройдёмся по нему быстро

Будем использовать известную библиотеку mclust(), которая строит сразу множество вариантов моделей кластеризации.

## Package 'mclust' version 5.4.5
## Type 'citation("mclust")' for citing this R package in publications.
## 
## Attaching package: 'mclust'
## The following object is masked from 'package:kohonen':
## 
##     map

Здесь посмотрим на тот выбор, который делает функция mclust(). Этот выбор основывается на посчитанных характеристиках качества модели BIC и ICL. Так как нам известно, что значения BIC и ICL есть случайные числа, можно посмотреть какие ещё варианты кластеризации близки по этим значениям.

Метод выбрал разбиение на 9 кластеров, но это много для нас (убрал эту часть для скорости).

Наш выбор основан на совокупности факторов:

  1. Значение байесовского информационного критерия модели (BIC) (больше — лучше). Заметим, что значение BIC есть случайная величина, а значит будет весьма осмысленным рассмотреть несколько моделей с похожим BIC и разным числом кластеров и параметров.

  2. Число кластеров и число оцениваемых параметров Заметим здесь, что при сопоставимом значении BIC будем выбирать наиболее простую модель с наименьшим числом оцениваемых параметров, так как чем меньше параметров приходится оценивать, тем меньше будет дисперсия соответствующих оценок при фиксированном размере выборки. Мы хотели бы получить до четырёх кластеров.

Посмотрим на все BIC, построим график.

EII VII EEI VEI EVI VVI EEE EVE VEE VVE EEV VEV EVV VVV
-4176528 -4176528 -4009780 -4009780 -4009780 -4009780 -2971174 -2971174 -2971174 -2971174 -2971174 -2971174 -2971174 -2971174
-4020284 -4014455 -3862532 -3862244 -3855064 -3851498 -2967176 -2943646 -2964013 -2943784 -2928977 -2928974 -2927618 -2927619
-3889482 -3885953 -3779774 -3777249 -3763502 -3767724 -2957998 -2931667 -2954104 -2929064 -2922661 -2920979 -2921040 -2918752
-3836340 -3835749 -3727901 -3727359 -3709331 -3709088 -2951693 -2927433 -2948048 -2923912 -2921751 -2920219 -2919935 -2917594

Выбрали модель VVV, 3

## ---------------------------------------------------- 
## Gaussian finite mixture model fitted by EM algorithm 
## ---------------------------------------------------- 
## 
## Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 3
## components: 
## 
##  log-likelihood     n   df      BIC      ICL
##        -1447144 13193 2582 -2918785 -2919823
## 
## Clustering table:
##    1    2    3 
## 4205 3721 5267

На основании наших предположений и графика с главными компонентами можем именовать классы.

## 
##         1         2         3 
## 0.3187296 0.2820435 0.3992269

Uncertainty

Качество кластеров можно оценить с помощью меры uncertainty, которая вычисляется так: из единицы вычитается вероятность наиболее вероятного класса. Это весьма неплохо показывает, насколько классы пересекаются. Посмотрим на различные квантили

##          60%          70%          80%          90%          95%        97.5% 
## 0.0002221314 0.0022168335 0.0170623789 0.1055412775 0.2560235155 0.3721556604 
##          99%        99.5% 
## 0.4529252851 0.4772857173

Чтобы можно было сравнивать методы, посчитаем within SS/between SS

## [1] 0.3897062

Мир

Посмотрим на отдельныx игроков в таблице:

rate name position mclust
1 L. Messi RW Attack
3 Cristiano Ronaldo ST Attack
4 Neymar Jr LW Attack
5 K. De Bruyne RCM Attack
15 Casemiro CDM Defence
18 M. Salah RW Attack
20 J. Kimmich RDM Defence
23 Sergio Ramos LCB Midfielder
31 S. Agüero ST Attack
33 L. Modrić RCM Defence
39 M. Verratti LCM Defence
40 Marquinhos RCB Midfielder
47 Rúben Dias RCB Midfielder
48 G. Chiellini SUB Midfielder
53 Sergio Busquets CDM Defence
59 R. Mahrez RW Attack
68 Rodri CDM Defence
75 E. Cavani SUB Attack
97 M. de Ligt LCB Midfielder
99 Jesús Navas RB Defence
100 Piqué LCB Midfielder
133 L. Sané LM Attack

Россия

rate name position mclust
221 Mário Fernandes RB Defence
617 A. Golovin LF Defence
759 R. Zobnin RDM Defence
766 A. Miranchuk SUB Attack
1034 G. Dzhikiya LCB Midfielder
1180 F. Smolov RS Attack
1181 A. Dzagoev CAM Defence
1251 D. Cheryshev LM Attack
1325 A. Miranchuk SUB Attack
1561 A. Kokorin SUB Attack
1806 D. Barinov RCM Defence
1890 A. Sobolev ST Attack
2032 G. Schennikov SUB Defence
2283 Z. Bakaev SUB Attack
2306 R. Zhemaletdinov RM Attack
2320 F. Chalov ST Attack
2932 I. Oblyakov LB Attack
3039 I. Diveev RCB Midfielder
3166 V. Vasin SUB Midfielder
3509 I. Akhmetov RDM Defence
3542 D. Zhivoglyadov SUB Defence
3544 S. Iljutcenko ST Attack
3694 K. Kuchaev RM Attack
3877 F. Kudryashov SUB Midfielder
4078 I. Kutepov SUB Midfielder
4215 R. Mirzov SUB Attack
4521 N. Rasskazov RB Midfielder
4554 S. Magkeev RCB Midfielder
4621 K. Nababkin SUB Defence
4624 A. Eschenko SUB Midfielder
4714 A. Zabolotnyi SUB Attack
5338 D. Kulikov LCM Defence
5369 D. Rybchinskiy LM Attack
5370 N. Umyarov SUB Defence
6282 K. Maradishvili RES Defence
6283 P. Maslov RES Midfielder
6403 A. Silyanov RB Midfielder
6539 E. Bashkirov RDM Defence
7437 M. Mukhin LDM Attack
8255 M. Suleymanov SUB Attack
9347 I. Zhigulev SUB Defence
9507 N. Tiknizyan RES Defence
9518 A. Lomovitskiy SUB Attack
10292 G. Melkadze SUB Attack
10540 I. Shinozuka RES Attack
10694 M. Ignatov SUB Attack
10746 I. Gaponov RES Midfielder
11038 M. Nenakhov RES Midfielder
11764 L. Klassen LB Defence
11938 V. Karpov RES Midfielder
13233 E. Shlyakov LB Midfielder
13240 E. Sevikyan RES Attack
14148 N. Iosifov RES Attack
14366 S. Babkin SUB Midfielder
14393 V. Yakovlev RES Attack
16890 V. Cherny SUB Attack
17597 D. Markitesov RES Midfielder
18853 I. Repyakh RES Attack

Про выбор расстояния

##              1         2      1180        23        20      3877
## 1    1.0000000 0.9190866 0.9353127 0.5348853 0.6496237 0.4511132
## 2    0.9190866 1.0000000 0.9659547 0.6823704 0.6801039 0.5521709
## 1180 0.9353127 0.9659547 1.0000000 0.5991011 0.6429120 0.5377683
## 23   0.5348853 0.6823704 0.5991011 1.0000000 0.8646511 0.8626866
## 20   0.6496237 0.6801039 0.6429120 0.8646511 1.0000000 0.8697465
## 3877 0.4511132 0.5521709 0.5377683 0.8626866 0.8697465 1.0000000
##              1         2      1180        23        20
## 2     76.37408                                        
## 1180  96.22370  75.35250                              
## 23   171.58380 133.16156 149.67966                    
## 20   150.03999 133.53277 144.99310  80.51708          
## 3877 195.78304 169.65848 135.03333 118.25396 121.43723

Иерархическая кластеризация

TODO:: добавить entanglement https://uc-r.github.io/hc_clustering

## classes_hclust
##     Attack Midfielder    Defence 
##  0.3904343  0.4235579  0.1860077

Кластер Defence стал значительно меньше (по сравнению с mclust).

Посмотрим на биплот, убедимся в том, что результат в целом похож на то, что мы видели ранее.

Качество

Здесь опять же надо сделать замечание, что кластеры мы делали с помощью другого функционала (минимизировали корреляцию между индивидами), поэтому то, что приведено дальше — не совсем верно.

## [1] 0.3015825

Мир

rate name position mclust
1 L. Messi RW Attack
3 Cristiano Ronaldo ST Attack
4 Neymar Jr LW Attack
5 K. De Bruyne RCM Attack
15 Casemiro CDM Defence
18 M. Salah RW Attack
20 J. Kimmich RDM Midfielder
23 Sergio Ramos LCB Defence
31 S. Agüero ST Attack
33 L. Modrić RCM Attack
39 M. Verratti LCM Midfielder
40 Marquinhos RCB Defence
47 Rúben Dias RCB Defence
48 G. Chiellini SUB Defence
53 Sergio Busquets CDM Defence
59 R. Mahrez RW Attack
68 Rodri CDM Defence
75 E. Cavani SUB Attack
97 M. de Ligt LCB Defence
99 Jesús Navas RB Attack
100 Piqué LCB Defence
133 L. Sané LM Attack

Россия

rate name position mclust
221 Mário Fernandes RB Midfielder
617 A. Golovin LF Midfielder
759 R. Zobnin RDM Midfielder
766 A. Miranchuk SUB Attack
1034 G. Dzhikiya LCB Midfielder
1180 F. Smolov RS Attack
1181 A. Dzagoev CAM Midfielder
1251 D. Cheryshev LM Attack
1325 A. Miranchuk SUB Attack
1561 A. Kokorin SUB Attack
1806 D. Barinov RCM Defence
1890 A. Sobolev ST Attack
2032 G. Schennikov SUB Midfielder
2283 Z. Bakaev SUB Attack
2306 R. Zhemaletdinov RM Attack
2320 F. Chalov ST Attack
2932 I. Oblyakov LB Attack
3039 I. Diveev RCB Defence
3166 V. Vasin SUB Defence
3509 I. Akhmetov RDM Attack
3542 D. Zhivoglyadov SUB Midfielder
3544 S. Iljutcenko ST Attack
3694 K. Kuchaev RM Midfielder
3877 F. Kudryashov SUB Defence
4078 I. Kutepov SUB Defence
4215 R. Mirzov SUB Midfielder
4521 N. Rasskazov RB Midfielder
4554 S. Magkeev RCB Defence
4621 K. Nababkin SUB Midfielder
4624 A. Eschenko SUB Midfielder
4714 A. Zabolotnyi SUB Attack
5338 D. Kulikov LCM Midfielder
5369 D. Rybchinskiy LM Midfielder
5370 N. Umyarov SUB Midfielder
6282 K. Maradishvili RES Attack
6283 P. Maslov RES Midfielder
6403 A. Silyanov RB Midfielder
6539 E. Bashkirov RDM Midfielder
7437 M. Mukhin LDM Midfielder
8255 M. Suleymanov SUB Attack
9347 I. Zhigulev SUB Midfielder
9507 N. Tiknizyan RES Midfielder
9518 A. Lomovitskiy SUB Attack
10292 G. Melkadze SUB Attack
10540 I. Shinozuka RES Attack
10694 M. Ignatov SUB Attack
10746 I. Gaponov RES Midfielder
11038 M. Nenakhov RES Midfielder
11764 L. Klassen LB Midfielder
11938 V. Karpov RES Midfielder
13233 E. Shlyakov LB Midfielder
13240 E. Sevikyan RES Attack
14148 N. Iosifov RES Attack
14366 S. Babkin SUB Midfielder
14393 V. Yakovlev RES Attack
16890 V. Cherny SUB Attack
17597 D. Markitesov RES Midfielder
18853 I. Repyakh RES Midfielder

DBSCAN

##TODO